The Orchestrator: The Compiler Driver
Think of the Compiler Driver (like GCC) as a grand conductor. It automates the complex transformation from human-readable source code to a binary executable. This journey, the Road to Execution, begins at Compile time and spans into Load time and Run time.
By utilizing Separate compilation, the driver processes main.c and sum.c independently. Changes in one module don't require the entire project to be re-translated—only the modified file is passed through the preprocessor (cpp), compiler (cc1), and assembler (as) before the Linker (ld) merges the resulting Relocatable Object Files.
Efficiency & The Memory Hierarchy
The Linker’s layout decisions for grid[0][0] or src[0][0] directly impact Throughput and Latency. By aligning data into a 32-byte cache line, the driver facilitates a Stride-1 reference pattern, minimizing Cold misses and avoiding Column-wise scan evictions. In advanced high-performance code, Unrolled loop parallelism ($4 \times 4$ unrolled loop) further hides Main memory to Cache mapping delays by optimizing clock frequency cycles (0x32, 0x1, 0x4, 0x51).